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Abstract 

A Local Orthogonal Polynomial Expansion (LOrPE) of the empirical density function is 
proposed as a novel method to estimate the underlying density. The estimate is constructed 
by matching localized expectation values of orthogonal polynomials to the values observed in 
the sample. LOrPE is related to several existing methods, and generalizes straightforwardly 
to multivariate settings. By manner of construction, it is similar to Local Likelihood Density 
Estimation (LLDE). In the limit of small bandwidths, LOrPE functions as Kernel Density 
Estimation (KDE) with high-order (effective) kernels inherently free of boundary bias, a natural 
consequence of kernel reshaping to accommodate endpoints. Faster asymptotic convergence rates 
follow. In the limit of large bandwidths, LOrPE is equivalent to Orthogonal Series Density 
Estimation (OSDE) with Legendre polynomials. We compare the performance of LOrPE to 
KDE, LLDE, and OSDE, in a number of simulation studies. In terms of mean integrated squared 
error, the results suggest that with a proper balance of the two tuning parameters, bandwidth 
and degree, LOrPE generally outperforms these competitors when estimating densities with 
sharply truncated supports. 

Keywords: boundary bias; kernel density estimation; local likelihood density estimation; mean 
integrated squared error; orthogonal series density estimation; sharply truncated support. 


1 Introduction 


Few areas of statistical inference receive as much attention as the classical problem of nonparametric 
density estimation. Taking as our basis for inference a random sample of observations xi,...,Xn 
from an underlying continuous distribution with probability density function (PDF) /(•) dehned 
on the compact support [a, 6], the simplest starting point is the empirical density function (EDF) 


/EMp(a:) = - - Xi), 

n 


2=1 


( 1 ) 


where (5(-) is the Dirac delta function. If additionally we assume the existence of a hrst few 
derivatives or that the PDF can have at most a few modes, a convolution of the EDF with a kernel 
function, K(-), often provides a much better estimate by producing a weighted average of points 
close to X. K{-) itself is usually chosen to be a symmetric continuous density with a scale parameter, 
so that the resulting kernel density estimate (KDE) is 


/kdeW = lll< /EMpfejdS = i E L’ (la) ■ 

' ^ i=l ^ ^ 

‘Texas Tech University, Department of Mathematics & Statistics, Lubbock, U.S.A. 

Aexas Tech University, Department of Physics, Lubbock, U.S.A. 


1 



The critical KDE tuning parameter is the bandwidth h. A convenient, tractable criterion which is 
typically used to optimize the choice of this parameter is the mean integrated squared error (MISE). 
Eor large sample sizes, the MISE can be expanded in powers of n~^. The two leading terms in this 
expansion are associated with the bias and variance of the estimator. Omission of all higher-order 
terms results in the asymptotic MISE (AMISE) approximation. 

Under regularity conditions, KDE is consistent, with an AMISE-optimal choice of bandwidth 
(h*) which depends on (computable) kernel moments and the (uncomputable) integrated squared 
curvature of /. Although the Epanechnikov kernel minimizes AMISE (is asymptotically optimal), 
the choice of kernel is generally not as influential as the choice of bandwidth. See Silverman (1986), 
Scott (1992) and Wand k. Jones (1995) for detailed treatments of the subject, and Sheather (2004), 
Wasserman (2006, ch. 6), and Givens k Hoeting (2013, ch. 10) for more concise surveys. 

Although the optimal /i* is unattainable in practice, there are several approaches to dealing with 
this issue. They range from quick rules-of-thumb, or plug-in methods, to the more computationally- 
intensive bandwidth selection based on cross-validation (Heidenreich et al., 2013). Rather, the 
major drawback of KDE is that it suffers from boundary bias, particularly if / is sharply truncated 
at the edges of its support. In such bounded support settings, KDE fails to attain the optimal 
convergence rate (Jones, 1993). 

One of the earliest attempts at correcting this problem was truncation and reflection of boundary 
kernels (Silverman, 1986). Several solutions based on local or adaptive methods have since been 
proposed; see for example Malec k Schienle (2014) for a survey. A more general solution is to use 
a local polynomial or local likelihood based approach (Hjort k Jones, 1996, Loader, 1996, 1999). 
These methods, and in particular the local likelihood density estimation (LLDE) detailed in Loader 
(1999), alleviate boundary bias, but require the solution of nonlinear equations at each x, and are 
therefore slow to compute. (Hall k Tao, 2002, however, argue that KDE has distinct advantages 
over LLDE in the absence of boundary effects.) Although adaptive kernels work fairly well (e.g., 
Chen, 1999, Kakizawa, 2004, Jones k Henderson, 2007), they presume some particular number of 
derivatives is matched at the boundary, which affects their asymptotic performanc^ 

There is thus a niche to be filled in the nonparametric density estimation literature by devising 
methods that alleviate the boundary bias issues in a more general way than the prescribed correc¬ 
tions of adaptive methods, whilst attaining the optimal KDE convergence rates in the interior of the 
support, and yet do all this in a computationally efficient manner. As will be argued, our proposed 
method attains faster asymptotic convergence rates by virtue of using higher-order (effective) ker¬ 
nels. The initial motivation for our quest comes from high energy physics experiments, where there 
is a need to estimate the distribution of visible energy in jets (i.e., collections of particles moving in 
approximately the same direction) due to smearing by the detector resolution (Volobouev, 2011). 
The situation is complicated by the fact that the energy of any one jet has to be reconstructed from 
signals produced by multiple particles in an array of sensors in the measuring device (calorimeter) 
with non-linear response (Wigmans, 2000). 

It is sometimes possible to use parametric functions to model such distributions. The results 
are fair, but there is room for improvement. Borrowing from the methods in Thas (2010), one 
idea is to model the bulk of the distribution with a flexible parametric model (like Johnson curves, 
Elderton k Johnson, 1969), and describe the deviations from this model nonparametrically, in the 
spirit of Yang k Marron (1999). This can be done with so called ’’comparison distributions” (Thas, 
2010, ch. 3). The basic approach is that if g and G denote respectively the PDF and cumulative 
distribution function (CDF) of a generic member of the parametric Johnson curves, and if and 
'k denote the PDF and CDF of a distribution supported on [0,1], then F{x) = 'I'(G'(x)) is also 

^See the R library bde for a comprehensive implementation of density estimation methods on bounded supports. 
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a CDF, with 


f{x)=g{x)il;{G{x)), (3) 

as its corresponding PDF. (This procedure can be iterated given a sequence of CDFs {Ti, 'I' 2 ,...} 
supported on [0,1].) 

This suggests one can model observations {xi} from X ~ / by first approximating / with 
g, even if it proves to be somewhat inadequate, and then mapping the {xj} to the [0,1] interval 
according to the transformation, 7/j = G{xi). The density of the {?/*} can now be approximated, 
either parametrically or nonparametrically, to yield an estimate of ip, whence the final / is obtained 
from Q. In the case that G = F, the true CDF, we have of course that G{X) is uniform on [0,1], 
a fact which can be used to assess the appropriateness of the initial G {e.g., via the comparison 
distribution methodology outlined in Thas, 2010, ch. 3). This is precisely where improved versions 
of KDE come in; they are needed to handle the sharply truncated support boundaries of the density 
of the {yi} resulting from this approach. 

In multivariate problems, an attractive density estimation approach consists in decomposing 
the estimated density into the product of the copula density and of the marginals (Gijbels & 
Mielniczuk, 1990). As the copula density is defined on the unit hypercube, KDE of the copula 
density suffers considerably from boundary bias. While a number of methods have been proposed 
for alleviating this deficiency (as reviewed in Charpentier et al, 2006; see also Chen &: Huang, 
2007), the asymptotic convergence rate of these methods at the boundary is nevertheless inferior 
to the convergence rate inside the hypercube. 

With this backdrop, we propose the use of local orthogonal polynomial expansion (LOrPE) as 
a new method to perform nonparametric density estimation. The theoretical development and 
genesis of LOrPE is discussed in section Section discusses connections with other methods: 
KDE, LLDE, and orthogonal series density estimation (OSDE). In particular, we establish there 
that LOrPE is equivalent to KDE with a high-order kernel for points well inside the support of 
the PDF. Thus, and through appropriate choice of its tuning parameters (discussed in section]^, 
LOrPE provides a general way to achieve adaptive (kernel) behavior, while also attaining optimal 
asymptotic convergence rates. Section examines the performance of LOrPE closely in some sim¬ 
ulation studies, in both oracle (best case) and non-oracle settings, with respect to the competitors 
outlined in section]^ The paper concludes in sectionwith an illustration on a real dataset. 


2 Development of LOrPE 

LOrPE inherits several of its features from OSDE (Efromovich, 1999), and can in fact be thought 
of as a localized version of OSDE. With f{x) a simple initial estimator such as ([^, LOrPE amounts 
to constructing a truncated orthogonal polynomial series expansion for the EDF near each point 
Xfit where the density estimate is desired. (In practice, these points would usually be taken to be 
uniformly spaced on a grid of values covering the support of the density.) Eor a chosen bandwidth 
h, this expansion is 

M / _ \ 

/LOrPE(ic) = f - 

k=0 ^ ’ 

where the polynomials Pk{x) are constrained to satisfy the normalization condition 


1 

h 




djk ) 


(5) 
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which, with agt = (a — Xfit)//i and 6fit = {b — is equivalent to 


'fit 


Pj{y)Pk{y)K{y)dy = 6jk, 


“fit 

where 6jk is the Kronecker delta and K{-) a suitably chosen kernel function. 
Ck {x fit) h) are determined by 

Cfc(xfit,/i) = ^ j f{x)Pk{ix - XM)/h)K{{x - XM)/h)dx, 

which, for f{x) = fEMp{x), is equivalent to 

1 " 

Cfc(xfit,/i) = ■^^'^Pkiixi - Xf,t)/h)K{{xi - Xfit)//i). 
i=l 


( 6 ) 

The coefficients 


(7) 

( 8 ) 


Because negative values can occur, the proposed density estimate at x = Xfit is then max{0, /LOrPE(a^fit)}- 
In general, this does not result in a bona fide density function (similarly to OSDE), and thus the 
final step in the process involves performing a renormalization over all grid points. The final (gen¬ 
uine) density estimate at x is denoted by fhOrPEix)- Generalizing LOrPE to a multivariate setting 
is in principle straightforward, necessitating only a switch to multivariate orthogonal polynomial 
systems. 

Equation Q can be usefully generalized to include a taper function t{k) as follows: 

OO 

/LOrPE(a;) = '^t{k)ck{xfit,h,)Pk{{x - XEt)/h). (9) 

k=0 


The idea of the taper function is to suppress high order terms gradually, instead of using a sharp 
cutoff at M. Also, as will be discussed in section a particular definition of the taper function 
allows for a simple extension of @ to non-integer values of M. We will normally require that 
t(0) = 1 in order to ensure correct asymptotic normalization, in addition to specifying that t{k) = 0 
for k > M. 

LOrPE admits an appealing interpretation in terms of the local density expansion Q , in which 
the “localized” expectation values of the orthogonal polynomials Pfc(-) are matched to their em¬ 
pirical values calculated from the data sample. This heuristic interpretation can be understood by 
making the following observation. Define the localized expeetation (at Xfit) of a function with 
respect to kernel K (bandwidth h) for a random variable X f as. 




f 4>{x)K{x)f{x)dx 
f K{x)f{x)dx 


Then, upon setting (j){x) = Pk{x), note that 


E£lin.(v)] 


Ck{xfit,h) 

Co(Xfit,/l) 


/LOrPE 


[Pk{X)]. 


3 Connections With Other Methods 

This section explores the connections between LOrPE and KDE, OSDE, and LLDE. We will show 
that under certain conditions LOrPE is essentially equivalent to KDE (Theorem [^; while under 
other conditions its behavior mimics OSDE (Theorem]^. Also, the local adjustments instituted 
by LOrPE to reduce support boundary bias are very much in the spirit of LLDE. 
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3.1 Kernel density estimation 

In general, LOrPE behaves as a linear combination of KDEs with varying kernels. To see this, 
define Kk{z) = Pk{z)K{z), and note that from Q with f{x) = fEMp{x) we can write the expansion 
coefficients as 

Kk ^ fEMpix)dx = fKDE{x\h,Kk), 

where the notation /kde( 3 ;|/i, K) emphasizes the dependence on bandwidth h and kernel K. Thus 
Q can be written as a weighted linear combination of KDEs with varying (improper) kernels Kk , 

/LOrPE(a;) = ^ fKDEix\h, Kk)Pk i - 7 ^ j • ( 10 ) 

k=0 ^ ' 

The following proposition establishes a basic result concerning the families of orthogonal poly¬ 
nomials arising from commonly used kernels. 

Proposition 1 For commonly used kernels from the Beta family supported on [—1,1] (Epanech- 
nikov, Biweight, Triweight, etc.), condition & generates the normalized Gegenbauer polynomials 
(up to a common multiplicative constant) at grid points Xfn sufficiently deep inside the support 
interval, provided h is small enough to guarantee that dpt < — 1 and bpt > 1 - 

Proof. By definition, the normalized Gegenbauer polynomials, Pj'^\x), j = 0,1,..., are orthogonal 
on [—1,1] with respect to the weight function w{x) = (1 — for some a > —1/2. This 

means that ^ 

J P-°‘\x)Plf‘\x)w{x)dx = 6jk. ( 11 ) 

Noting that w{x) = CaK{x), where K{x) = c“^(l — x^)““^/^/[_i_i](x) is a beta kernel with associ¬ 
ated normalizing constant Cq = r(a -|- 1)/[^/nT{a + 1 / 2 )], equation pT| becomes 

/ I nb 

p')°‘\x)plf^ {x)caK{x) dx = Ca / (x)p|"^ (x)K(x) dx, 

since K{x) = 0 outside of [—1,1] and a at < —1 and bfit > 1- This requires extending the poly¬ 
nomials so that they are also defined on [afit,6fit]. While this extension is not unique, any rea¬ 
sonable definition will do, e.g., by using the same coefficients as on the [—1,1] interval. Values of 
a = 3/2, 5/2, 7/2, 9/2 dehne respectively the Epanechnikov, Biweight, Triweight, and Quadweight 
kernels. ■ 


cfe(xfit,/i)= y ^ 


Remark 1 If xpt is sufficiently elose to the ends of the support [a, 6 ] relative to the kernel sup¬ 
port, then, since the kernel is used as the weight function in generating them, the polynomials 
will vary depending on xpt, and the notation Pk{-,Xfit) would be more appropriate. This in turn 
implies the kernels Kk in (10) also depend on xpt, and will undergo adjustments near the bound¬ 
ary. For example, with the Beta kernels of Proposition the effective support of Kk becomes 
[max(-l, a/ji), min(l, 5/;t)]. 

The following theorem establishes the main result that, when evaluated at grid points far from 
the support boundaries, LOrPE is equivalent to KDE with a high-order kernel. In particular. 
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this results implies that (under the appropriate conditions) LOrPE enjoys the same asymptotic 
optimality results as does KDE. Unlike KDE however, LOrPE does not intrinsically suffer from 
boundary bias because the orthogonality requirement imposed by ([^ automatically adjusts the 
shape of the (orthogonal) polynomials near the boundary. 


Theorem 1 When evaluated at points Xfit, 0 is equivalent to KDE with the effective kernel 

OO 

Keff{x) = Y, t{k)Pk{0)Pk{-x)Ki-x). (12) 

fc =0 

Under the following additional Assumptions: 


(a) K{x) is an even kernel supported on some interval {—okWr) that is symmetric about 0; 


(b) Xfit is sufficiently far from the density support boundaries [a, b] so that the Pk{-) ’s can be gen¬ 
erated on an interval of orthogonality that is symmetric about zero, and subsequently extended 
to [dfit, bfit] by keeping the same coefficients, where a fit = (a — x fit)/h and bfit = {b — Xfit)/h, 
as in the proof of Proposition ^ and 


(c) we have afit< -ax < ax <bfit; 


then the effective kernel (12): 

(i) is an even function supported on {—axWK); 


(a) is normalized provided t(0) = 1; and 

(in) is a high-order kernel if t{k) is a step function, i.e. t{k) = 1 for all k < M and t{k) = 0 for 
all k > M, in which case the kernel order is M + 1 if M is odd and M 2 if M is even. 


Proof. See the appendix. ■ 

The local adjustments made by LOrPE near the support boundary are illustrated in Eigurej^ 
In these plots, the effective kernel iPeff is shown vs. x —Xfit for a density that is sharply truncated at 
0. The normal density is used as the weight function, with bandwidth set at /i = 0.1. Polynomials 
up to degree M = 4 are considered. The plots correspond to LOrPE density estimation on the [0,1] 
interval for points: exactly at the boundary (left panel), close to the boundary (middle panel), and 
away from the boundary (right panel). 


3.2 Orthogonal series density estimation 


The key idea underlying OSDE for a univariate density can be traced back to at least Cencov 
(1962). Updated monograph-length treatments of the topic can be found in Tarter &: Lock (1993) 
and Efromovich (1999). There is a strong connection between LOrPE and OSDE. If {(fk} is an 
orthonormal basis and / is square integrable, then the classical OSDE of f{x) is 

J 1 ” 

/oSDE(a:) = where 9j = - Y^l^ji^i)- (13) 

j=0 ^ i=l 


The tuning parameters here consist of the choice of basis functions and their number, J, to carry 
in the summation. In a more general form, and adapted for densities supported on [a, b] , this 
estimator can be represented as 


/osDE(a:) = + 




(14) 
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Figure 1: LOrPE effective kernel plots for a density that is sharply truncated at 0, corresponding 
to support data points: (a) exactly at the boundary, (b) close to the boundary, and (c) away from 
the boundary. 


where the Wj G [0,1] are shrinkage coefficients (Efromovich, 1999). Comparing Q and (14), we see 
immediately that LOrPE can be viewed heuristically as a localized version of OSDE, since the “basis 
functions” {Pk} in the former are not global, but adjust locally depending on Xfif Another facet 
of the connection between these estimators is revealed in the following theorem, which establishes 
that, for large bandwidths, LOrPE is essentially equivalent to OSDE with a Legendre polynomial 
basis. 

Theorem 2 In the limit as h —)■ oo, the LOrPE estimate & for pdf f{x) with finite support [a,h], 
reduees to classical OSDE in terms of the basis functions 


(j)j{x) = 




2x — a — b 
b — a 


where the {Lj} are orthonormal Legendre polynomials on [—1,1]. 
Proof. See the appendix. ■ 


3.3 Local likelihood density estimation 

In spirit (but not mathematical detail) LOrPE is also very similar to LLDE; see Hjort & Jones 
(1996), and Loader (1996, 1999). As observed by Loader (1999, ch. 5), LLDE overcomes boundary 
bias by matching localized sample moments to population moments using the log-polynomial den¬ 
sity approximation (polynomial approximations on the log scale). As was noted in section]^ LOrPE 
instead matches localized expectation values of orthogonal polynomials to their sample values using 
polynomial density approximations (polynomial approximations on the original scale). Although 
the LLDE approach may be theoretically superior, LOrPE enjoys the pragmatic advantages of com¬ 
putational speed and numerical stability, as it does not involve the solution of non-linear equations 
at every grid point. 


4 Selection of Tuning Parameters 

This section discusses strategies for selecting the two LOrPE tuning parameters, bandwidth (h) and 
polynomial degree (M). (In principle the taper function t(-) could also be tuned, but for simplicity 
we restrict our attention to simple truncation.) We emphasize this dependence on tuning parameters 
by writing 

/LOrPE(a;) = /LOrPE(a:|h, M), 
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and discuss first an adaptation of the AMISE-optimal plug-in method for KDE (Silverman’s Rule). 
Methods based on cross-validation are also proposed. The performance of these approaches will be 
examined in section [Sj 


4.1 The plug-in approach 


Note that from Theorem LOrPE can be viewed as being equivalent to KDE with a high-order 
kernel, K^.^. The optimal AMISE expression for KDE with kernel function A'efT(-) of order r, is 
known to be (e.g., Wand & Jones, 1995), 


AMISE;,, (r) = 


2r + l 
2r 


2r(r!) 2i?Kgg(r)2>Kgg(r)^Rjw(r)n 


-2r 


1 l/(2r-hl) 


with corresponding optimal value of h, 


K{r) = 




l/(2r+l) 


(15) 


(16) 


2rn^K 

where denotes the r-th derivative of /, and 

= J x''Kes{x)dx, RK^g{r) = J Kesixfdx, Rj(r){r) = j f'^''\xfdx. 

The unknown moments liKgg(?') and RK^gir) can be computed once the underlying kernel K(-) 
is selected; e.g., for Gaussian kernels we have Hermite polynomials, for beta kernels Gegenbauer 
polynomials, etc. Eor sample sizes in the range 10^ < n < 10^, optimal values of r are likely to 
be relatively low, and thus these moments can be tabulated across a few r values with a symbolic 
mathematics computer package, and then included in the relevant programs. 

The only real difficulty is estimation of iiy(r)(r), but as explained by Wand &: Jones (1995), 
a simple transformation leads to the expression i?y(r)(r) = (—l)'’^/^ 2 r 5 and thus it suffices to study 
estimation of functionals ips = E[/(^)(X)], for s even. Eor this Wand & Jones (1995) propose multi¬ 
stage direct plug-in algorithms, involving the iteration of a KDE-type estimator of V’r with optimal 
bandwidth that depends on 'i/jg, s > r. Starting with a rough estimate of 'ips at some stage, which 
can be based on the well-known value corresponding to a this is iterated to arrive at 

some estimate ips- A naive estimate of ' 02 r follows by using an estimate of a (e.g., sample standard 
deviation)!^ Plugging the resulting estimate of Rj(r){r) into (15) gives eventually, 

. -r. / X s 2r-1 l/(2r-|-l) 

2r(2r!) 2 

PKeg(r') 


AMISE;,, (r) 


2r + l 
Ard 


(r!)3 


TT 


n 


(17) 


Now minimize ( |17[ ) in r to get f (which by Theorem immediately provides also an estimate of 
M). Einally, substitute r into (16) to obtain the estimates 


hAMISE = 2d 


-^Kgff(r) 

2f(2r!)n //Kgg(r)2 


l/(2r+l) 


and Mamise = 



r even, 


f odd. 


(18) 


Of course, this can only serve as a rough estimate, the intent being to provide reasonable initial 
values for a more refined search. The fact that LOrPE naturally self-adjusts near the support end 
points, complicates the calculation of the boundary contribution into the AMISE, as well as the 
analysis of the bias introduced by the truncation of the reconstructed density when forced to be 
non-negative (with subsequent renormalization). 

^For the case r = 2 in the context of KDE this is known as Silverman’s Rule. 






















4.2 Cross-validation methods 


Least squares cross-validation (LSCV) for estimation of a generic PDF / considers the integrated 
squared error of the density estimate, 


ISE 


fix) - fix) 


1 2 


dx. 


(19) 


As proposed by Bowman (1984) and Hall (1983), this leads eventually to minimization of the LSCV 
criterion. Applied to LOrPE, this yields 


LSCVih,M) 


2 'fc 

fLOvPEix\h, M)‘^dx - - X]'^LOrPE(^*l^’^)’ 

i=l 


( 20 ) 


where 


. M n 

/LOrPE(^l^> = TffZTlih 

'' ^ k=0 j=l 

is the leave-one-out LOrPE density estimate from il)> obtained by omitting the i-th observation. 
As suggested in the literature (e.g., Sheather, 2004), the existence of multiple minima means that 
it is prudent to plot LSCV{h, M) over a grid of h and M values. Erom an asymptotic perspective, 
the main drawback of this criterion is its slow rate of convergence. 

A related simpler and intuitively appealing but less popular approach, is likelihood cross- 
validation (LCV), the essential idea dating back to at least Habbema et al. (1974) and Duin 
(1976); see for example Silverman (1986) or Givens &: Hoeting (2013, ch. 10) for an updated discus¬ 
sion. This is based on taking the likelihood function of the leave-one-out density estimate above, 
leading to minimization of 

n 

LCV(h,M) = n/i-?pE(x,|h,M). 

i=\ 

Reasoning that the density values at each point are taken from slightly different distributions (and 
not from the same distribution as in a genuine likelihood), the term pseudo-LC\ might perhaps be 
more suitable. 

An obvious obstacle with implementation of this criterion is the situation when /LOrPE(®*l^’ ~ 
0 for some i. Its use is also problematic for densities with infinite support due to the strong influ¬ 
ence exerted by fluctuations in the tails. To avoid these situations a regularization condition can 
be introduced, leading to the modified regularized LCV (RLCV) criterion. 



RLCV{h,M) 


n 

max 

i=l 


I LOrPE 


(xi|/i, M), 


f(+i) 

■/LOrPE 


ixi\h,M) 


n 


a. 


( 21 ) 


where a > 0 is the regularization parameter, and 


/LOrra(®l^>-^) ~ 

k=0 


1 


M 


Xi X fit 

h 


K 


Xi X fit 

h 


Pk 


X - Xfit 

h 


is the contribution of data point Xi toward the LOrPE density estimate Q. Note therefore that 
for each i = 1,..., n we have 

/LOrPE( 3 :^Ih, M) = /LQj,pg(x|/l, M) -| — /pQj,pp(x|/l, M). 
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The case for regularizing LCV was made as early as Schuster &: Gregory (1981) who remarked 
that for tails exponential and heavier, the use of LCV without regularization results in inconsistent 
density estimates. From a large number of simulations, we have noted that a = 0.5 is a reasonable 
default value. Of course, one can also add a to the list of tuning parameters to be selected via 
RLCV, a possibility that will be explored in section 


4.3 Effective degrees of freedom and shrinkage 


In situations where truncation of the density below zero is unnecessary, LOrPE functions as a linear 
smoother of the EDF, analogously to KDE. This can be seen by taking the definition of the KDE 
effective kernel from Theorem and observing that we can write Q as 


/LOrPE(a;) 


1 

h 


KeE 


x-y 

h 


Jemp {y)dy. 


This suggests the possibility of adapting the idea of effective degrees of freedom for linear smoothers 
in a regression setup (Buja et al, 1989), to the analogous situation of density estimation. If S is 
the smoothing matrix, the first of three sensible definitions for the effective degrees of freedom in 
a linear smoother, as given by Buja et al. (1989), is tr(5S^). 

For an arbitrary bandwidth, calculation of this trace appears to be analytically intractable due 
to edge effects. However, in the limit as /i —?• oo, recall from Theorem that LOrPE converges 
to OSDE in terms of Legendre polynomials. Now, for a density fit by a polynomial of degree M, 
the number of degrees of freedom of the fit (number of free parameters) is obviously M (M + 1 
coefficients minus the one constraint from normalizing the PDF). As the effective degrees of freedom 
in a smoother is not limited to integers, this motivates a natural extension of LOrPE to non-integer 
values of M. Through suitable choice of the taper function, we can ensure that the effective degrees 
of freedom in any given fit is always M. 

To formalize this, consider without loss of generality a PDF supported on [—1,1]. With tf) 
a chosen taper function and the {L^} defined as in Theorem OSDE smoothing is then seen 
to be performed by the linear operator S{x,y) = YlT=o^(^)^k{x)Lk{y), in an appropriate inner 
product space. Requiring the inner product with the EDF to yield OSDE, motivates the following 
definition: 


/ I ^ n OO 

S{x,y)fEMp{y)dy = - EE t{h)Lk {x)Lk (xi). 

” i=l k=0 

This is now in the form of (14), with the t(k) playing the role of the shrinkage coefficients Wk- The 
operator S is in fact self-adjoint (symmetric), so that 


SS^ = 


{S{x, z),S{z, y)) = J ^ S{x, z)S{z, y)dz 

OO oo 

^t{k)Lk{x)Lk{z)^t{j)Lj{z)Lj{y)dz 


rl oo 

'-1 


fc =0 


1=0 


oo 


'^t^{k)Lk{x)Lk{y), 


fc =0 


the last line following from identity (24). Additionally, note that we have 

/ I oo oo 

6{x — y)dxdy = / '^^t^{k)L\{x)dx = y^f^(fc) 

■1 J— \ n —n 


k=0 
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Adapting the above definition for the effective degrees of freedom from Buja et al. (1989) to density 
estimation, we therefore arrive at the identity 


OO 


M = tr(55^) - 1 = > t {k) - 1. 


k=0 


( 22 ) 


There are many possible choices for t(-) which would make (22) work, but perhaps the simplest 


is to take the step function approach of section However, if the optimal M is not an integer, 
an extra adjustment is needed, so that a more general prescription (with m = [MJ denoting the 
largest integer less than or equal to M) is to define: 

1 , k < m, 

t{k) = < \/M — m, k = m + 1, 

0 , k >m + 2. 


Throughout the paper, we adopt these shrinkage coefficients in all instances where LOrPE is applied, 
for any given bandwidth h. 


5 Simulations 

The primary goal of this section is to compare the MISE performance of LOrPE with that of its 
main competitor, KDE. This will be done both from oracle and non-oracle based perspectives. 
The oracle based comparisons, so called because the optimization has access to the true analytical 
ISE, are aimed at benchmarking the performance of the two methods, especially with regard to 
estimating densities that are sharply truncated. The non-oracle based comparisons will explore 
the performance of LOrPE to all of its rivals and analogues discussed thus far: KDE, OSDE, and 
LLDE. 

To produce a spanning set of densities / to be investigated, some elements from the list in 
Wand &: Jones (1995, Table 2.2) were employed as a starting point. This includes the KDE-optimal 
Beta(4,4) on (—1,1), as derived by Terrell (1990) for minimizing AMISE through minimization of 
total curvature. To these were added a few that are sharply truncated. Table lists the choice of 
distributions selected for the simulation study, where (t){z) and 4* ( 2 ;) denote the PDE and CDE of a 
standard normal. In particular, there are three distributions with sharp boundaries: two standard 
normals, one truncated at 0 and the other at —1, and a standard exponential. It is expected that 
KDE will handle the N(0,1) truncated at 0 well using data reflection (or mirroring), and it would 
therefore be interesting to compare its performance with that of LOrPE which does not enjoy this 
advantage. On the other hand, we would expect to see LOrPE outperform KDE for the N(0,1) 
truncated at —1, as the data reflecting method doesn’t work well in this case (due to discontinuity 
of the first derivative). 

5.1 Oracle MISE comparisons: LOrPE vs. KDE 

The “oracle” MISE comparisons, called “best case” by Jones & Henderson (2007), are useful for 
benchmarking LOrPE vs. KDE in determining the best possible performance for each method 
with regard to estimation of a particular density. Dassanayake (2014) details the procedure used 
to effect these comparisons for each of the distributions in Table This involves performing a 
computationally intensive search for the optimal h* and M* that minimize the MISE over grids of 
polynomial degree values, M G Ai, and bandwidths h G %. MISEs were calculated by averaging 
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Table 1: List of distributions for simulations. 


Name of X Distribution/Density of X 


Beta(4,4) on [—1,1] 
(optimal by KDE) 

r{x) = -§{i-x^fi{\x\<i) 

N(0,1) 

II 

(2; 

Normal Mix 1 

A ~ |Zi + 4^2 

(bimodal) 

Zi ~ N(0,1) and Z 2 ~ N(3/2,1/9) 

Exponential 1) 

(sharp boundary at 0) 

f{x) = e~^I{x > 0) 

N(0,1) on [0, oo) 
(truncated at 0) 

f{x) = 2(f){x)I{x > 0) 

N(0,1) on [—1, oo) 
(truncated at —1) 

/(^) = > -1) 

Normal Mix 2 

A ~ + IZ 2 

(sharp peak at 0) 

Zi ~ N(0,1) and Z 2 ~ N(0,1/100) 


1,000 numerical estimates of ISE values (19). For KDE, M is related to the kernel order via result 
(iii) of Theorem and is therefore the approximate kernel order. 

Fignrej^ displays the resulting log;tg(MISE(/i*, M)) values as a function of M, for each of LOrPE 
and KDE, and sample sizes of n = 10^ and n = 10®. According to these graphical snmmaries, it is 
clear that LOrPE works in a similar manner to KDE when estimating densities with exponentially 
declining tails at both ends of the snpport, such as the iV(0,1). Similar results were observed for 
the Beta(4, 4), and the two Normal Mixes (not shown). For densities with sharp edges, LOrPE 
tends to attain lower MISE values than KDE. The A^(0,1) truncated at 0 (with KDE mirroring) is 
a notable exception; but the better performance of KDE is only really discernible at larger sample 
sizes and higher kernel orders. If the crucial data mirroring property of KDE at the boundaries 
is removed, then the tables are reversed in favor of LOrPE, particularly at small sample sizes and 
low kernel orders. The Exponential 1) constitutes a dramatic case in favor of LOrPE, while the 
A^(0,1) truncated at —1 (with KDE benefiting from mirroring) is somewhere in between these two 
extremes. Note that LOrPE does not use data mirroring (although it can use kernel mirroring 
whereby the weight function is reflected at the boundary and added to the non-reflected part). 

The appropriate minimum oracle log;^Q(MISE(/i*, M*)) values for all the densities of Table 
are displayed in Table along with the corresponding optimal (M*, h*). Note that all truncated 
A^(0,1) KDE values were obtained using data mirroring, whereas the un-truncated A^(0,1) did not. 
As can be seen, at lower sample sizes all LOrPE estimates have lower (or the same) MISE, except 
for the truncated normals. However, this KDE advantage for the truncated normal at —1 gradually 
erodes, so that at higher sample sizes only the KDE estimates for the 0 truncated N{Q, 1) persist 
in having lower MISE than LOrPE. 


5.2 Non-oracle MISE comparisons 

The intent in this section is to compare LOrPE MISE values to those of its closest competitors, 
KDE, LLDE, and OSDE, in a realistic (non-oracle) setting. In order to make these comparisons as 
fair as possible in terms of mimicking an unsophisticated user, “reasonable” default settings were 
used for the the respective tuning parameters of each method. The details are as follows. 
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Figure 2; Oracle MISE values for sample sizes n = 10^ and n = 10^, as a function of polynomial 
order M (LOrPE) or approximate kernel order M (KDE). 


LOrPE: Uses the plug-in estimates from (18), implemented via the NPStat package (Volobouev, 
2012 ). 


KDE: Uses the Sheather &: Jones (1991) two-stage plug-in (’’dpi” or ’’direct plug-in”) bandwidth 
with a normal kernel and sample standard deviation as the estimate of scale, implemented 
via R library ks. 


LLDE: Uses the above KDE plug-in bandwidth, a Gaussian kernel, and zero-order polynomial, 
implemented via the R library locf it. 


OSDE: The estimator in (13) was coded with the number of terms, J, chosen according to the 


Hart (1985) scheme. The NPStat package (Volobouev, 2012) is used to generate the necessary 
orthogonal polynomials on a grid (consisting of 2,048 points). The lowest and highest order 
statistics from the sample of size n are mapped to the l/(2n) and 1 — l/(2n) quantiles, 
respectively. All other points are then mapped linearly using these two extremes. The 
support of the density is now estimated by inversely mapping the [0,1] interval. The discrete 
analog of Legendre polynomials are employed; generated by the Gram-Schmidt procedure for 
a uniform weight on the grid in [0,1]. 
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Table 2: Oracle /o 5 (io(MISE) values for the densities in Table as a function of sample size. 
The values in parentheses correspond to the optimal where M is the polynomial order 

(LOrPE) or approximate kernel order (KDE) and h is the bandwidth. For each density and each 
n, the lowest of the two MISE values appears in bold face. 

n = 10^ n = 10^ n = 10^ 


Distribution 

LOrPE 

KDE 

LOrPE 

KDE 

LOrPE 

KDE 

N(0,1) 

-2.441 

-2.440 

-3.260 

-3.260 

-4.177 

-4.177 

(19, 11.4) 

(13, 8.3) 

(16, 8.2) 

(17, 8.2) 

(17, 7.0) 

(17, 7.0) 

Normal Mix 1 

-2.014 

-2.014 

-2.774 

-2.774 

-3.661 

-3.661 

(0, 1.0) 

(0, 1.0) 

(4, 1.5) 

(4, 1.5) 

(11, 2.0) 

(11, 2.0) 

Normal Mix 2 

-1.378 

-1.370 

-2.208 

-2.208 

-3.108 

-3.108 

(0, 0.25) 

(2, 0.43) 

(6, 0.51) 

(6, 0.51) 

(14, 0.74) 

(14, 0.74) 

N(0,1) on [0, oo) 

-2.243 

-2.475 

-2.997 

-3.318 

-3.869 

-4.213 

(0, 1.2) 

(17, 9.7) 

(2, 1.9) 

(19, 8.6) 

(4, 2.7) 

(18, 7.5) 

N(0,1) on [—1, oo) 

-2.223 

-2.241 

-3.091 

-2.609 

-3.932 

-3.010 

(2, 3.0) 

(4, 3.9) 

(2, 2.1) 

(1, 0.79) 

(2, 1.6) 

(0, 0.19) 

Beta(4,4) 

-2.044 

-2.064 

-2.890 

-2.886 

-3.824 

-3.705 

(4, 13.0) 

(8, 2.04) 

(4, 1.5) 

(10, 2.9) 

(6, 11.6) 

(9, 1.5) 

Exponential(l) 

-2.265 

-1.462 

-3.085 

-1.954 

-4.002 

-2.392 

(2, 4.1) 

(0, 0.48) 

(6, 13.2) 

(0, 0.16) 

(8, 13.7) 

(0, 0.082) 


MISEs were calculated empirically as in section 5.1 The data were once again simulated from 
most of the distributions in Table as well as Student’s t with 1, 2, and 3 degrees of freedom 
truncated to the interval [—1,2]. The results are presented on Table which summarizes the 
/o( 7 io(MISE) values for three different sample sizes within each distribution. We note that LOrPE 
yields consistently minimum MISE values for the sharply truncated normal distributions and the 
Exponential. For the truncated t distributions the results are mixed, but LOrPE tends to dominate 
for larger sample sizes. In nearly all cases where LOrPE does not yield the minimum MISE, it is a 
close second. 


5.3 Oracle and non-oracle MISE comparisons: LOrPE vs. KDE 

Recall that the LOrPE plug-in approach is meant to serve as an initial estimate in a more refined 
search for appropriate h and M values. Since plug-in formulae do not take boundary effects into 
account, we would expect sub-optimal performance from LOrPE in regard to estimation in the 


vicinity of the support boundary. The already good LOrPE plug-in performance seen in section 5.2 
could therefore potentially be improved by using cross-validation methods. Given that oracle 
comparisons provide lower bounds on MISE values, we may ask two interesting questions of LOrPE 
cross-validation methods: (i) how close can they get to LOrPE oracle values, and (ii) how close can 
they get to KDE oracle values. 

This section aims to answer these questions, using both the LSCV and RLCV criteria, as 


described by equations (20) and (21), respectively, with the regularization parameter set at a = 0.5 


in the latter. Both oracle and non-oracle methods are considered, and as such the simulation details 


for the former parallel those of section |5.H while those for the latter are identical to section 5.2 


For KDE oracle computations: the A^(0,1) and N{0, 1) truncated at —1 did not use data mirroring, 
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Table 3: Non-oracle /o 5 (io(MISE) values for 4 estimators of the true density. MISEs are based 
on 1,000 realizations simulated from a variety of distributions and sample sizes (n). Eor each 
distribution and each n, the lowest of the 4 MISE values appears in bold face. 

Distribution logio{n) LOrPE KDE LLDE OSDE 



2 

-2.138 

-2.198 

-2.183 

-1.634 

N(0,1) 

3 

-3.088 

-2.973 

-3.179 

-1.650 


4 

-4.045 

-3.741 

-4.158 

-2.652 


2 

-2.177 

-1.576 

-1.427 

-1.666 

N(0,1) on [0, oo) 

3 

-2.923 

-2.010 

-1.594 

-2.642 


4 

-3.770 

-2.392 

-1.613 

-3.634 


2 

-2.085 

-2.023 

-1.837 

-1.823 

N(0,1) on [—1, oo) 

3 

-3.005 

-2.564 

-2.188 

-2.799 


4 

-3.874 

-2.980 

-2.248 

-3.776 


2 

-1.743 

-1.888 

-1.824 

-1.148 

Normal Mix 1 

3 

-2.108 

-2.223 

-2.028 

-1.149 


4 

-2.477 

-2.278 

-2.060 

-1.149 


2 

-2.239 

-1.374 

-1.299 

-0.677 

Exponential(l) 

3 

-2.915 

-1.783 

-1.386 

-1.2328 


4 

-3.740 

-2.157 

-1.393 

-0.679 


2 

-1.891 

-2.317 

-2.447 

-1.347 

t{l) on [-1,2] 

3 

-2.712 

-2.694 

-3.000 

-2.337 


4 

-3.661 

-3.118 

-3.128 

-3.337 


2 

-1.980 

-2.346 

-2.366 

-1.408 

t{2) on [-1,2] 

3 

-2.839 

-3.065 

-3.191 

-2.404 


4 

-3.724 

-3.546 

-3.591 

-3.400 


2 

-2.039 

-2.328 

-2.289 

-1.437 

t{3) on [-1,2] 

3 

-2.879 

-3.061 

-3.207 

-2.427 


4 

-3.769 

-3.856 

-3.763 

-3.416 


while the A^(0,1) truncated at 0 used mirroring. This time a variety of sample sizes were considered 
in order to reveal any possible convergence of methods as n —>■ oo. Also, for brevity only 6 of the 
(representative) distributions listed in Table were examined. 

The resulting /o( 7 io(MISE) values appear plotted vs. sample size in Eigure|^ The answer to 
the above two questions seems clear. First, LOrPE cross-validation methods come very close to 
LOrPE oracle values, with the RLCV criterion dominating LSCV most of the time. Secondly, 
and remarkably, except for the A^(0,1) and 0 truncated A^(0,1), LOrPE cross-validation methods 
produce consistently lower MISE values than KDE oracle. 

In some cases, and especially at small sample sizes, the LOrPE-RLCV method may not be 
achieving the lowest possible MISE. One reason for this could be that the regularization parameter 
choice of a = 0.5 is not optimal. To investigate this issue. Figure plots the /o( 7 io(MISE) values 
vs. a G [0,1] for the distributions considered in Figure and for sample size n = 10^ only. The 
error bars around each value extend from the 84.13*^ to the 15.87*^ percentiles divided by 2y/n, and 
provide a sense of sampling variability through a robust measure of the standard error. It is clear 
that, perhaps with the exception of the AI(0,1) case, LOrPE-RLCV is reasonably insensitive to the 
choice of a. This suggests that it may not be necessary to estimate this extra tuning parameter, 
and just use a default value of a = 0.5. 
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N(0,1) 


N(0,1) truncated at 0 


N(0,1) truncated at -1 




sample size (n) 



Exponential(l) 


Truncated t(1) 


Truncated t(3) 



Figure 3: Plots of oracle (solid lines) and non-oracle /o( 7 io(MISE) values for LOrPE and KDE. The 
non-oracle methods of LSCV (dashed lines) and RLCV (dotted lines) apply only to LOrPE. 


6 Real Data Application 


As an illustration of the proposed methodology, we consider the lengths of n = 86 spells of psychi¬ 
atric treatment (days) undergone by patients used as controls in a study of suicide risks (Copas & 
Eryer, 1980). The data were presented by Silverman (1986, Table 2.1), who used them to demon¬ 
strate certain inadequacies with KDE. Scaled to the unit interval by dividing all observations by the 
largest value of 737, it is publicly available in the R library bde as ” suicide.r”. Eigurej^ displays a 
histogram with rugplot, and five density estimates. Sturges’ formula is used to compute the breaks 
and number of classes in the histogram shaded in gray (the default in R function “hist”). 

KDE (red dashed lines) uses the plug-in bandwidth as described in section 5.2 As expected, 
there is an apparent bias at the left end of the support, the estimate dips down toward zero, 
whereas the data suggests there should be a large amount of mass in that vicinity. A similar 
outcome occurs with LLDE (purple dotdash lines), which displays less “wigglyness” in the tail, but 
a sharp “kink” at the peak. As suggested by Loader (1999), greater care was exercised in selecting 
appropriate values for the LLDE tuning parameters: we used AIC to identify the optimal nearest 
neighbor component of the smoothing parameter and polynomial order, instead of the (quicker) 
KDE plug-in bandwidth and degree zero of section [A2l as a means of specifying the effective degrees 
of freedom. No appreciable changes were observed with kernels different from Gaussian. OSDE 
(green dotted lines) obviously undersmoothes badly, a consequence of the degree preferred by Hart’s 
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Figure 4: Plots of /o 5 'io(MISE) values vs. the regularization parameter a for the LOrPE-RLCV 
method applied to 1,000 simulated datasets of sample size n = 10^. The error bars provide a 
robust measure of the standard error. 


(1985) method being J = 48. 

Rather more believable performance was obtained with Chen’s (1999) boundary-corrected beta- 
kernel density estimator (blue longdash lines), which picks up the mass at the peak, but seems to 
be somewhat oversmoothed. This is Chen’s (1999) second beta-kernel estimator, called “modified” 
in the R library bde with which it is implemented, since Chen (1999) showed it consistently out¬ 
performs the first beta-kernel estimator. The critical bandwidth tuning parameter is set at the 
default value of 6 = the AMISE optimal order for such kernels (Chen, 1999). Einally, we 

note the arguably superior performance of LOrPE (black dashed and solid lines). LORPE-RLCV 
uses the default value of a = 0.5 for the regularization parameter, as suggested by the simulations 
in section 5.3, and the optimal degree and bandwidth were M = 7 and h = 2047.2. LORPE-LSCV 
delivers a similar performance with M = 2.9 and h = 605.3. 


7 Summary Remarks 

We have shown that LOrPE is a useful extension to the (already vast) array of tools for nonpara- 
metric density estimation. This novel idea has at its basis the local expansion of the EDE into a 
series of orthogonal polynomials around a selection of grid points. It was demonstrated that away 
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Figure 5: Density estimates for the Suicide data: KDE (red/dashed), Chen (1999) (blue/longdash), 
LLDE (purple/dotdash), OSDE (green/dotted), LOrPE-LSCV (black/dashed), LOrPE-RLCV 
(black/solid). 


from the support boundary LOrPE essentially functions like KDE with a high-order kernel, whereas 
close to the boundary LOrPE is adaptive in the sense that its effective kernels naturally change 
shape to accommodate the endpoint, thereby reducing boundary bias. Easter asymptotic conver¬ 
gence rates follow naturally by virtue of the higher-order kernels. LOrPE also shares important 
connections with LLDE and OSDE. Simulations demonstrated that LOrPE generally outperforms 
these estimators, and especially KDE, when estimating densities with sharp boundaries. Also, 
LOrPE allows for the inclusion of a taper function, a feature which takes LOrPE beyond KDE with 
high-order kernels. 

These reasons make LOrPE applicable in a wider range of problems than KDE. When estimating 
distributions which decay rapidly at infinity, LOrPE results are identical to KDE. Additionally, 
the local polynomial modeling can effectively reduce the bias for densities with several (at least 
M) continuous derivatives. A proper balance of h and M can thus result in a better overall 
estimator. Cross-validation, and especially a regularized version of likelihood cross-validation, 
seems to be a promising way of selecting appropriate values for these tuning parameters. For large 
n, the simnlations suggest LOrPE MISE approaches the oracle (or “best case”) MISE. Finally, 
LOrPE calculations remain essentially unchanged in multivariate settings, requiring only a switch 
to multivariate orthogonal polynomial systems. 
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A Proof of Theorem [T] 

Substituting the expression for Cfc(-) from Q into ([^ gives 

OO 

/LOrPE(a:) = '^t{k)ck{xfit,h)Pk 


k=0 

OO 


X - Xfit 

h 




fc =0 


2 = 1 




n f OO 


nh 


E 


1=1 Kk=0 


Xi X fit 

h 

- x^t 
h 


K 


Pk 


Xi X ht 

h 

Xi fit 

h 


Pk 


X-Xf,t 


K 


h 

Xi a: fit 

h 




efR—h- 


1 ” 


2=1 


X — Xi 


h 


Defining y = (xfit — Xi)/h, evaluate Kgff at grid point to see that 


K 


eff 


X fit ^2 

h 


Kes (y) = ^ mPk (0) Pk i-y) K i-y ). 


k=0 


To establish (i)-(iii), note that Assumptions (a) and (b) imply that Pk{x) is an even (odd) function 
for any even (odd) integer k. This means Pk{—x) = Pk{x) for k even, and Pfc(O) = 0 for A: odd, so 
that the effective kernel becomes 


Kefr(a:)= ^ t{k)Pk (0) Pk (x) K (x), 

{k : k>0, k even} 


(23) 


and Kes{—x) = Kes{x) is an even function supported also on {—aK,aK), thus establishing (i) 
Now, multiplying both sides of the above equation by Ro(a:) = 1 and integrating, gives 

r ro-K 

/ Kes{x)dx = / Kes{x)PQ{x)dx 

Jm. j — 

pflt _^ 

= / ^ t{k)Pk{tJ)Po{x)Pk{x)K{x)dx, 

^fit {k:k>0^ /c even} 
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which follows by Assumption (c). Now, interchanging integral and sum in the above expression 
and then using ([^, establishes (ii) as follows: 

/ Kes{x)dx = ^ t{k)Pk{0) / Po{x)Pkix)K{x)dx 

^ {k:k>0^ k even} 

= ^ t{k)Pk{0)dok 

{k : fc>0, k even} 

= t(0)Po(0) = t(0). 

To prove (iii), first define the j-th kernel moment as 


HjiKes) = / X^Kesix)dx. 

Jr 

Now, since the effective kernel is an even function, it is clear = 0 for j odd. Hence, it 

suffices to consider the case when j is even, whence 


fJ.j{KeG)= / X^Kesix)dx = 


E 


a(o) 




Pk{x)K{x)dx = 


{k : 0<k<M, k even} 




E 


C^jkPk ( 0 ), 


{fc : 0<k<M, k even} 


if we define 


ra.K 


’fit 


Oijk = / x^ Pk{x)K{x)dx = / x^ Pk{x)K{x)dx. 


'-a.K 


‘fit 


Now, from the theory of orthogonal polynomials, we know that 

/-^fit . 

x^ = 2_,ajkPk{x)i where ajk = / x^ Pk{x)K{x)dx = ajk- 

k=o •^“fit 

Since ajk is the coefficient of the Pk{x) contribution (a polynomial of order k) to the series expansion 
of x^, it is obvious that ajk = 0 for A: > j, and ajk = 0 when k and j have opposite parity (only 
even k terms contribute when j is even, and vice-versa). With these observations, it is clear that 
for j < M 


M 


M 


/ij(A'eff) = ^ ajkPk{0), and 


x^ = 


^ ^ C^jkPk 


k=0 


k=0 


whence we see that 


k‘jiP-es) — — 


1, j = o, 

0, j = 


If M is even, then since M -|- 1 is odd and K^six) is an even function, we have additionally that 
/iM+i(A'eff) = 0. Thus the effective kernel order is M + 1 if M is odd, and M -|- 2 if M is even. 


B Proof of Theorem 

As —>■ oo the value of the kernel K{-) becomes less and less dependent on the grid point Xfit inside 
[a, 6]. In fact, starting from ([^, note that for very large h, K{{xi — Xfit)//i) essentially becomes 
constant on [a, 6]. Equation (4) then gives rise to Legendre polynomials since these are generated 
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when integrating with respect to a constant weight function, in a manner similar to Proposition 
To see this, start with the orthonormal Legendre polynomials Lk{z) on [—1,1], satisfying 

5jk = j Lj{z)Lk{z)dz. (24) 

To construct the corresponding orthonormal system on [a,b], we make the transformation, z = 
(2x — a — b)/{b — a), so that (24) becomes 


bjk — 


— a 


-Li 


2x — a — b 
b — a 


Lk 


27 * — fi — b\ 

^ dx= Pj (x) Pk (x) dx, (25) 


where 


Pkix) = 


b — a 


Lk 


b — a 

2x — a — b 
b — a 


(26) 


Now construct an orthonormal system on the interval [a fit, & fit] using K{0) as the weight function 
instead of 1. By means of the transformation y = (x — Xfit)//i, (25) then becomes 

rb 


djk — 


Pi (x) — . Pk (x) K{0)dx 


"'Hi I h 


^fit 

^fit 


K{0) 


Pj {yh + Xfit) 


KiO) 


Pk {yh + X fit) K{0)dy 


Pj{y)Pkiy)KiO)dy 


where 


Pk{y) = 


2h 


{b — a)K{0) ^ b — 


/ 2yh + 2x fit - a - 6 


(27) 


which follows from (26). Now, from the proof of Theorem we have the following expression for 
LOrPE: 

n M 


fhOrPEix) = 


nh 


X - Xfit 
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Pk 


i=l k=0 

Substituting Pk{-) for Pk{-) in the above equation, gives 
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Since 


lim K I= K{0) 
h^oo \ h J 


we obtain 


n M 


/LOrPE(3:) = - EE 


i=l k=0 
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Lk 


2x — a — b 
b — a 
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2xi — a — b 
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(pkix) = 


b — a 


Lu 


b — a 


which is the classical OSDE (|13|) in terms of the orthogonal polynomials 

'2x — a — b^ 


(32) 
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